Search CORE

151 research outputs found

Online Matrix Completion Through Nuclear Norm Regularisation

Author: B. Ludäscher
J. Chen
S. Abiteboul
S. Bose
Publication venue
Publication date: 01/01/2006
Field of study

It is the main goal of this paper to propose a novel method to perform matrix completion on-line. Motivated by a wide variety of applications, ranging from the design of recommender systems to sensor network localization through seismic data reconstruction, we consider the matrix completion problem when entries of the matrix of interest are observed gradually. Precisely, we place ourselves in the situation where the predictive rule should be refined incrementally, rather than recomputed from scratch each time the sample of observed entries increases. The extension of existing matrix completion methods to the sequential prediction context is indeed a major issue in the Big Data era, and yet little addressed in the literature. The algorithm promoted in this article builds upon the Soft Impute approach introduced in Mazumder et al. (2010). The major novelty essentially arises from the use of a randomised technique for both computing and updating the Singular Value Decomposition (SVD) involved in the algorithm. Though of disarming simplicity, the method proposed turns out to be very efficient, while requiring reduced computations. Several numerical experiments based on real datasets illustrating its performance are displayed, together with preliminary results giving it a theoretical basis.Comment: Corrected a typo in the affiliatio

arXiv.org e-Print Archive

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

Plasma Edge Kinetic-MHD Modeling in Tokamaks Using Kepler Workflow for Code Coupling, Data Management and Visualization

Author: Barreto R.
Bateman G.
Chang C. S.
Cummings J.
Klasky S.
Kritz A.
Ku S.
Ludäscher B.
Pankin A.
Park G.
Pearlstein D.
Podhosrzki N.
Snyder P.
Strauss H.
Sugiyama L.
Publication venue: 'Global Science Press'
Publication date: 01/09/2008
Field of study

A new predictive computer simulation tool targeting the development of the H-mode pedestal at the plasma edge in tokamaks and the triggering and dynamics of edge localized modes (ELMs) is presented in this report. This tool brings together, in a coordinated and effective manner, several first-principles physics simulation codes, stability analysis packages, and data processing and visualization tools. A Kepler workflow is used in order to carry out an edge plasma simulation that loosely couples the kinetic code, XGC0, with an ideal MHD linear stability analysis code, ELITE, and an extended MHD initial value code such as M3D or NIMROD. XGC0 includes the neoclassical ion-electron-neutral dynamics needed to simulate pedestal growth near the separatrix. The Kepler workflow processes the XGC0 simulation results into simple images that can be selected and displayed via the Dashboard, a monitoring tool implemented in AJAX allowing the scientist to track computational resources, examine running and archived jobs, and view key physics data, all within a standard Web browser. The XGC0 simulation is monitored for the conditions needed to trigger an ELM crash by periodically assessing the edge plasma pressure and current density profiles using the ELITE code. If an ELM crash is triggered, the Kepler workflow launches the M3D code on a moderate-size Opteron cluster to simulate the nonlinear ELM crash and to compute the relaxation of plasma profiles after the crash. This process is monitored through periodic outputs of plasma fluid quantities that are automatically visualized with AVS/Express and may be displayed on the Dashboard. Finally, the Kepler workflow archives all data outputs and processed images using HPSS, as well as provenance information about the software and hardware used to create the simulation. The complete process of preparing, executing and monitoring a coupled-code simulation of the edge pressure pedestal buildup and the ELM cycle using the Kepler scientific workflow system is described in this paper

Caltech Authors

RDF Querying

Author: A. Wilk
B. Ludäscher
B. Parsia
D. Chamberlin
E. Cohen
F. Bry
F. Bry
G. Gottlob
G. Karvounarakis
G. Karvounarakis
J. Bailey
J. Bruijn de
J. Robie
M. Kifer
M. Lacher
M. Magiridou
M. Marx
S. Abiteboul
S. Berger
T. Grust
V. Bönström
W. May
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Reactive Web systems, Web services, and Web-based publish/ subscribe systems communicate events as XML messages, and in many cases require composite event detection: it is not sufficient to react to single event messages, but events have to be considered in relation to other events that are received over time. Emphasizing language design and formal semantics, we describe the rule-based query language XChangeEQ for detecting composite events. XChangeEQ is designed to completely cover and integrate the four complementary querying dimensions: event data, event composition, temporal relationships, and event accumulation. Semantics are provided as model and fixpoint theories; while this is an established approach for rule languages, it has not been applied for event queries before

CiteSeerX

Crossref

Open Access LMU

Oxford University Research Archive

Recommended from our members

Semantic Annotation of Mutable Data

Author: Dou Lei
Hanken James
Kelly Maureen
Lowery David B.
Ludäscher Bertram
Macklin James A.
Morris Paul J.
Morris Robert A.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 04/11/2013
Field of study

Electronic annotation of scientific data is very similar to annotation of documents. Both types of annotation amplify the original object, add related knowledge to it, and dispute or support assertions in it. In each case, annotation is a framework for discourse about the original object, and, in each case, an annotation needs to clearly identify its scope and its own terminology. However, electronic annotation of data differs from annotation of documents: the content of the annotations, including expectations and supporting evidence, is more often shared among members of networks. Any consequent actions taken by the holders of the annotated data could be shared as well. But even those current annotation systems that admit data as their subject often make it difficult or impossible to annotate at fine-enough granularity to use the results in this way for data quality control. We address these kinds of issues by offering simple extensions to an existing annotation ontology and describe how the results support an interest-based distribution of annotations. We are using the result to design and deploy a platform that supports annotation services overlaid on networks of distributed data, with particular application to data quality control. Our initial instance supports a set of natural science collection metadata services. An important application is the support for data quality control and provision of missing data. A previous proof of concept demonstrated such use based on data annotations modeled with XML-Schema

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

FigShare

Verbalizing phylogenomic conflict: Representation of node congruence across competing reconstructions of the neoavian explosion

Author: Brown J.W.
Franz N.M.
Ludäscher B.
Musher L.J.
Yu S.
Publication venue: Public Library of Science
Publication date: 01/02/2019
Field of study

Phylogenomic research is accelerating the publication of landmark studies that aim to resolve deep divergences of major organismal groups. Meanwhile, systems for identifying and integrating the products of phylogenomic inference-such as newly supported clade concepts-have not kept pace. However, the ability to verbalize node concept congruence and conflict across multiple, in effect simultaneously endorsed phylogenomic hypotheses, is a prerequisite for building synthetic data environments for biological systematics and other domains impacted by these conflicting inferences. Here we develop a novel solution to the conflict verbalization challenge, based on a logic representation and reasoning approach that utilizes the language of Region Connection Calculus (RCC-5) to produce consistent alignments of node concepts endorsed by incongruent phylogenomic studies. The approach employs clade concept labels to individuate concepts used by each source, even if these carry identical names. Indirect RCC-5 modeling of intensional (property-based) node concept definitions, facilitated by the local relaxation of coverage constraints, allows parent concepts to attain congruence in spite of their differentially sampled children. To demonstrate the feasibility of this approach, we align two recent phylogenomic reconstructions of higher-level avian groups that entail strong conflict in the "neoavian explosion" region. According to our representations, this conflict is constituted by 26 instances of input "whole concept" overlap. These instances are further resolvable in the output labeling schemes and visualizations as "split concepts", which provide the labels and relations needed to build truly synthetic phylogenomic data environments. Because the RCC-5 alignments fundamentally reflect the trained, logic-enabled judgments of systematic experts, future designs for such environments need to promote a culture where experts routinely assess the intensionalities of node concepts published by our peers-even and especially when we are not in agreement with each other

ZENODO

Directory of Open Access Journals

Dryad Digital Repository (Duke University)

Electronic Archiving System

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

White Rose Research Online

FigShare

Finding and sharing GIS methods based on the questions they answer

Author: A. Ballatore
Hitzler P.
Ludäscher B.
R. Lemmens
S. Scheider
Publication venue: Taylor and Francis
Publication date: 07/05/2018
Field of study

Geographic information has become central for data scientists of many disciplines to put their analyses into a spatio-temporal perspective. However, just as the volume and variety of data sources on the Web grow, it becomes increasingly harder for analysts to be familiar with all the available geospatial tools, including toolboxes in Geographic Information Systems (GIS), R packages, and Python modules. Even though the semantics of the questions answered by these tools can be broadly shared, tools and data sources are still divided by syntax and platform-specific technicalities. It would, therefore, be hugely beneficial for information science if analysts could simply ask questions in generic and familiar terms to obtain the tools and data necessary to answer them. In this article, we systematically investigate the analytic questions that lie behind a range of common GIS tools, and we propose a semantic framework to match analytic questions and tools that are capable of answering them. To support the matching process, we define a tractable subset of SPARQL, the query language of the Semantic Web, and we propose and test an algorithm for computing query containment. We illustrate the identification of tools to answer user questions on a set of common user requests

Crossref

Directory of Open Access Journals

Birkbeck Institutional Research Online

University of Twente Research Information

Utrecht University Repository

FigShare

Expressiveness and complexity of xml publishing transducers

Author: Benedikt M.
Floris Geerts
Frank Neven
Krishnamurthy R.
Ludäscher B.
Maneth S.
Wenfei Fan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2007
Field of study

A number of languages have been developed for specifying XML publishing, i.e., transformations of relational data into XML trees. These languages generally describe the behaviors of a middleware controller that builds an output tree iteratively, issuing queries to a relational source and expanding the tree with the query results at each step. To study the complexity and expressive power of XML publishing languages, this paper proposes a notion of pub-lishing transducers. Unlike automata for querying XML data, a publishing transducer generates a new XML tree rather than per-forming a query on an existing tree. We study a variety of pub-lishing transducers based on what relational queries a transducer can issue, what temporary stores a transducer can use during tree generation, and whether or not some tree nodes are allowed to be virtual, i.e., excluded from the output tree. We first show how exist-ing XML publishing languages can be characterized by such trans-ducers. We then study the membership, emptiness and equivalence problems for various classes of transducers and existing publish-ing languages. We establish lower and upper bounds, all matching except one, ranging from PTIME to undecidable. Finally, we inves-tigate the expressive power of these transducers and existing lan-guages. We show that when treated as relational query languages, different classes of transducers capture either complexity classes (e.g., PSPACE) or fragments of datalog (e.g., linear datalog). For tree generation, we establish connections between publishing trans-ducers and logical transductions

CiteSeerX

Crossref

Edinburgh Research Explorer

Institutional Repository Universiteit Antwerpen

What the Web Has Done for Scientific Data – and What It Hasn’t

Author: B. Ludäscher
J. Cheney
J. Gray
M. Szomszor
M.F. Fernandez
P. Buneman
P. Buneman
P. Buneman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

The web, together with database technology, has radically changed the way scientific research is conducted. Scientists now have access to an unprecedented quantity and range of data, and the speed and ease of communication of all forms of scientific data has increased hugely. This change has come at a price. Web and database technology no longer support some of the desirable properties of paper publication, and it has introduced new problems in maintaining the scientific record. This brief paper is an examination of some of these issues

CiteSeerX

Crossref

Computing environments for reproducibility: Capturing the 'Whole Tale'

Author: Brinckman Adam
Chard Kyle
Gaffney Niall
Hategan Mihael
Jones Matthew B.
Kowalik Kacper
Kulasekaran Sivakumar
Ludäscher Bertram
Mecum Bryce D.
Nabrzyski Jarek
Stodden Victoria
Taylor Ian J.
Turk Matthew J.
Turner Kandace
Publication venue: 'Elsevier BV'
Publication date: 01/02/2018
Field of study

The act of sharing scientific knowledge is rapidly evolving away from traditional articles and presentations to the delivery of executable objects that integrate the data and computational details (e.g., scripts and workflows) upon which the findings rely. This envisioned coupling of data and process is essential to advancing science but faces technical and institutional barriers. The Whole Tale project aims to address these barriers by connecting computational, data-intensive research efforts with the larger research process—transforming the knowledge discovery and dissemination process into one where data products are united with research articles to create “living publications” or tales. The Whole Tale focuses on the full spectrum of science, empowering users in the long tail of science, and power users with demands for access to big data and compute resources. We report here on the design, architecture, and implementation of the Whole Tale environment

arXiv.org e-Print Archive

Online Research @ Cardiff